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Abstract 

We present a stochastic model for a social network, where new actors may join the 
network, existing actors may become inactive and, at a later stage, reactivate themselves. 
Our model captures the evolution of the network, assuming that actors attain new re- 
lations or become active according to the preferential attachment rule. We derive the 
mean- field equations for this stochastic model and show that, asymptotically, the distri- 
bution of actors obeys a power-law distribution. In particular, the model applies to social 
networks such as wireless local area networks, where users connect to access-points, and 
peer-to-peer networks where users connect to each other. As a proof of concept, we demon- 
strate the validity of our model empirically by analysing a public log containing traces 
from a wireless network at Dartmouth College over a period of three years. Analysing 
the data processed according to our model, we demonstrate that the distribution of user 
accesses is asymptotically a power-law distribution. 

1 Introduction 



We present a stochastic model for a social network [ScoOC], where new actors may join the 
network, existing actors may become inactive and, at a later stage, may reactivate themselves. 
Our model captures the evolution of the network, assuming that actors attain new relations 
or become active according to the preferential attachment rule. The concept of preferential 



attachment, originating from [Pri7£], has become a common theme in stochastic models of 



networks [AB02, NewOSfl . This behaviour often results in the "rich get richer" phenomenon, 



for example, where new relations to existing actors are formed in proportion to the number 
of relations those actors currently have. 

The model presented incorporates the novel aspect of differentiating between active and 
inactive actors, and allowing actors' status to change between active and inactive over time. 
This type of network dynamics is especially relevant to situations where actors may con- 
nect /disconnet or login/logout from the network, in particular, when network registration is 
needed as a prior condition to the first time an actor connects to the network. The network 
models proposed so far either assume that all actors are active, or that when actors leave the 



network they do not rejoin it [ASBSOC] 



By deriving the mean-field equations for this model of a social network, we obtain the result 
that, asymptotically, the distribution of actors obeys a power law. Power-law distributions 
taking the form 

f(i) = c rt 
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where C and <j> are positive constants, are abundant in nature [Sch91]. The constant <f> is 
called the exponent of the distribution. Examples of such distributions are: Zipf's law, which 
states that the relative frequency of a word in a text is inversely proportional to its rank, 
Pareto 's law, which states that the number of people whose personal income is above a certain 
level follows a power-law distribution with an exponent between 1.5 and 2 (Pareto's law is also 
known as the 80:20 law, stating that about 20% of the population earn 80% of the income) 
and Lotka's law, which states that the number of authors publishing a prescribed number of 
papers is inversely proportional to the square of the number of publications. 

Recently, several researchers have detected powerdaw distributions in the topology of 
several networks such as the World- Wide- Web [BKM + 0C], e-mail networks [EMB02|, collab- 
oration networks | Gro02j , FLL06 ] and peer-to-peer networks [RIF02]. 



There are several examples of networks that can be modelled within our formalism. One 



example is that of a wireless network [KE05], where mobile users having, e.g. a laptop, PDA 
or mobile phone, connect to access points within a defined region (e.g. campus, building or 
airport). In this case the actors are the users and the relations are between users and access 
points. The user is active during a connection and otherwise inactive. Another example, is 



that of a peer-to-peer network [Ora01|, where users (referred to as peers) connect to other 
peers in order to exchange information. Peer-to-peer networks are of prime importance to the 
future of the internet, as networks such as Bittorrent |PGES05|, Kazaa [LKR06| and Skype 
GDJ06H are becoming increasingly popular and thus account for a sizeable amount of all 



internet traffic. 

Our stochastic model is based on the transfer of balls (representing actors) between urns 
(representing actor states), where we distinguish between active balls in, regular, unstarred 
urns and inactive balls in starred urns. The relationships of a particular actor are represented 
as pins attached to the corresponding ball. 

We note that our urn model is an extension of the stochastic model proposed by Simon in 
his visionary paper published in 1955 |Sim55 ], which was couched in terms of word frequencies 
in a text. Previously, in [ FLL06| ] , we considered an alternative extension of Simon's model 
by adding a preferential mechanism for discarding balls from urns resulting in an exponential 
cutoff in the power-law distribution. 

In the model we present here, at each step of the stochastic process, with probability 
p, two events may happen: either a new active ball is added to the first unstarred urn with 
probability r, or with probability 1 — r an inactive ball is selected preferentially from a starred 
urn and is activated by moving it to the corresponding unstarred urn. Alternatively, with 
probability 1 — p, an active ball is selected preferentially from an unstarred urn and then 
two further events may happen: it is either moved along to the next unstarred urn with 
probability q, or with probability 1 — q the selected ball becomes inactive by moving it to 
the corresponding starred urn. We assume that a ball in the ith urn has % pins attached to 
it (which represents an actor having i relations). Our main result is that the steady-state 
distribution of this model is an asymptotic power law, and, moreover, as a proof of concept 
we demonstrate the validity of our model by analysing data from a real wireless network. 

The rest of the paper is organised as follows. In Section ^ we present an urn transfer 
model allowing balls to be active or inactive by moving from starred urns to unstarred urns 
and vice versa. We then derive in Section || the steady-state distribution of the model, which, 
as stated earlier, follows an asymptotic power-law distribution. In Section we show how we 
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can fit the parameters of the model to data, and in Section || we demonstrate how our model 
can provide an explanation of the empirical distributions found in wireless networks. Finally, 
in Section ^ we give our concluding remarks. 



2 An Urn Transfer Model 

We now present an urn transfer model for a stochastic process that emulates the situation 
where balls (which might represent actors) become inactive with a small probability, and can 
later become active again with some probability. We assume that a ball in the iih urn has i 
pins attached to it (which might represent the actors' relations). The model is an extension 
of our previous model of exponential cutoff [ FLL05| ], where balls are discarded with a small 



probability. 

We assume a countable number of (unstarred) urns, urni, urn2, urn?,, . . . and correspond- 
ingly a countable number of starred urns urn\,urn*,,urn^, . . . , where the former contains 
active balls and the latter contain the inactive balls. Initially all of the urns are empty except 
urrii, which has one ball in it. Let Fi{k) and F?(k) be the number of balls in urni and urn*, 
respectively, at stage k of the stochastic process, so Fi(l) = 1, all other -Fi(l) = and all 
F*(l) = 0. Then, at stage k + 1 of the stochastic process, where k > 1, one of two events 
may occur: 

(i) with probability p, < p < 1, one of two events may happen: 

(a) with probability r, < r < 1, a new ball (with one pin attached to it) is inserted 
into urni, or 

(b) with probability 1 — r, a starred urn is selected, with urn* being selected with 
probability proportional to iF*(k), the number of pins it contains, and a ball is 
chosen from the selected urn, urn*, and transferred to urni (this is equivalent to 
making the ball active). 

(ii) with probability 1 — p an urn is selected, with urni being selected with probability 
proportional to iFi(k), the number of pins it contains, and a ball is chosen from the 
selected urn, urni] then, 

(a) with probability q, < q < 1, the chosen ball is transferred to urn^i, (this is 
equivalent to attaching an additional pin to the ball chosen from urni), or 

(b) with probability 1 — q the ball chosen is transferred to urn* (this is equivalent to 
making the ball inactive). 



We note that we could modify the initial conditions so that, for example, urni and urn\ 
initially contained S, S* > 1 balls, respectively, instead of urni having just one ball and urn\ 
being empty. It can be shown, from the development of the model below, that any change 
in the initial conditions will have no effect on the asymptotic distribution of the balls in the 
urns as k tends to infinity, provided the process does not terminate with either all of the 
unstarred urns empty or all of the starred urns empty (cf. |FLL05| ). In the former case we 
need to ensure that p > (1 — p)(l — q), i.e. that the number of balls going into unstarred urns 
is greater than the number of balls going out of unstarred urns. In the latter case we need to 
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ensure that (1 — p)(l — q) > p(l — r), i.e. that the number of balls going into starred urns is 
greater than the number of balls going out of starred urns. 

More specifically, the probability of termination must be small, i.e. 



(i- P )(i- q y 



and 

p(l-r) 



< e 



< e 



for some e > 0. We observe that these are the probabilities that the gambler's fortune will 
not increase forever | Ros83 |. 

The expected total number of balls in the unstarred urns at stage k is given by 

k 

E{j2Fi(k)) = l + (k-l)(p-(l-p)(l-q)) 

i=l 

= (l-p)(2-q)+k(p-(l-p)(l-q)), (1) 

and in the starred urns by 

k 

E{Y J Ft{k)) = (*-l)((l-p)(l- g )-p(l-r)). (2) 

i=l 



The total number of pins attached to balls in uriii at stage k is iFi(k), so the expected 
total number of pins in the unstarred urns is given by 



fc-i fe-i 



E(^2iFi(k)) = l + (k-l)(rp+(l-p)q)+p(l-r)J2^-^-P)^-<l)J2 e v(^ 

i=l j=l j=l 

where tpj, 1 < j < k — 1, is the expectation of ^>'j, the number of pins attached to the ball 
chosen at step (ib) of stage j + 1 (i.e. the urn number), and 9j, 1 < j < k — 1, is the 
expectation of Q'j, the number of pins attached to the ball chosen at step (iib) of stage j + 1 
(i.e. the urn number). More specifically, 

^ = E^'A =e( n=i* 2 *?(jA (4) 

and 

9j = Efy) = E { |H|7t) • ( 5 ) 

The quotient of sums in the second expectation in (||) (respectively in @), which we 
denote by \&j (respectively by Qj), is the expected value of (respectively of 6^) given the 
state of the model at stage j. 
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Correspondingly, the expected total number of pins in the starred urns is given by 

k k—1 k—1 

E(£,iFt(kj) = (l-p)(l-q)J20j-p(l-r)J2^- ( 6 ) 

i=l j=l j=l 

Since at stage j + 1 there cannot be more than j pins in the system, it follows that 

1 < 9j, ipj < j- 



Now let 



and 

k 



1 h 



V>< fc > = j5>. 



Since there are at least as many pins (starred pins) in the system as there are balls (starred 
balls), it follows from, ([[]) and (g), and, @ and (|), that 

(1 -p){l-q)- p(l - r) < (1 - p)(l - g)#) - p(l - r)^ (fe) < (1 - p) - p(l - r), (7) 

which implies that 9^ — ip^ is bounded. This bounded difference will suffice for the purpose 
of the developments in the next section and we will denote 0(°°) by 9 and ^ {oo) by tp. 

3 Derivation of the Steady State Distribution 



Following Simon [3im55|, we now state the mean-field equations for the urn transfer model. 
For i > 1 we have 

Ek(Fi(k + 1)) = Fi(k) + p k (q(i - l)Fi-i(k) - iF(k)) + a k (l - r)iF*(k), (8) 

where Ek{Fi(k + 1)) is the expected value of Fi(k + 1) given the state of the model at stage 
k, and 

p k = l ~ P , (9) 

Eti iFi(kY 

a k = —r — (10) 

Eti iF*{k) 

are the normalising factors. 

Equation || gives the expected number of balls in urrii at stage k + 1. This is equal to 
the previous number of balls in urrii plus the probability of adding a ball to itrn, minus the 
probability of removing a ball from urrii, and finally plus the probability of transferring a 
ball to urrii from urn*. 

The first probability is just preferentially choosing a ball from urrii-i and transferring 
it to urrii m step (iia) of the stochastic process defined in Section ||, the second probability 
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is that of preferentially choosing a ball from urrii in step (iia) of the process, and the third 
probability is that of preferentially transferring a ball from urn* to urni in step (ib) of the 
process. 

In the boundary case, i = 1, we have 

E k (F 1 (k + 1)) = F x (k) + pr-(3 k Fi(fc) + a k (l - r) F*(k). (11) 



Equation 11 gives the expected number of balls in urn\ at stage k + 1, which is equal 
to the previous number of balls in urn\ plus the probability of inserting a new ball into 
this urn in step (ia) of the stochastic process defined in Section [2| minus the probability 
of preferentially choosing a ball from urn\ in step (iia), and finally plus the probability of 
preferentially transferring a ball to urn\ from urn\ in step (ib) of the process. 

For starred urns, for i > 1, corresponding to (||) and (|ll]), we have 

E k (F*(k + 1)) = F*(k) +p k (l- q)iFi{k) - a k (l - r)iF*(k), (12) 

where E k (F?(k + 1)) is the expected value of F*(k + 1) given the state of the model at stage 
k. 



Equation 12 gives the expected number of balls in urn* at stage k+1. This is equal to the 



previous number of balls in urn* plus the probability of preferentially transferring a ball from 
urrii to urn* in step (iib) of the stochastic process defined in Section ^ minus the probability 
of preferentially transferring a ball from urn* to urni in step (ib) of the process. 



In order to solve the equations of the model, namely (|8|), ( |1T| ) and (12), we make the 
assumptions that, for large k, the random variables (3 k and a k can be approximated by con- 
stants (i.e. non-random) values depending only on k. To this end we take the approximations 
to be 



Ik 



1-p 



and 



(k-1) (rp + (1 -p)q+ p (l -r) ip^- 1 ) - {I - p){\ - q) 6^ k ~ 1 )) 



ak (k - 1) ((1 - p)(l - q) d^- 1 ) - p(l - r) V^ 1 )) ' 



(13) 



(14) 



The motivation for the above approximations is that the denominators in the definitions 
of (5k and a k have been replaced by asymptotic approximations of their expectations as given 
in (||) and ([]), respectively. We note en passant that replacing j3 k by j3 k and a k by a k results 
in an approximation similar to that of the u p k model" in LFLW02 1 , which is essentially a 
"mean-field" approach. 

We next take the expectations of (|8|), (11) and (12). By the linearity of the expectation 
operator E(-), we obtain 



E(Fi(k + 1)) = E(Fi(k)) + $ k (q(i - l)E(Fi-x{k)) - iE(Fi(k)j) + a fc (l - r)iE(F*(k)), (15) 
E(F 1 (k + l)) = E(F 1 (k))+pr-p k E(Fi(k))+a k Q--r) E (F*(k)) (16) 
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and 

E(F*(k + 1)) = E(F*(k)) + 4(1 - q)iE(Fi(k)) - a k (l - r)iE(F*(k)). (17) 



In order to obtain an asymptotic solution of (|l5|), (fL6|) and (jjjj), we require that E(Fi(k))/k 
and E(F*(k))/k converge to some values /j and /*, respectively, as A; tends to infinity. Assume 
for the moment that this is the case, then, provided the convergence is fast enough, E{Fi{k + 
1)) - E{Fi(k)) tends to f and E(F*(k + 1)) - E(F*(k)) tends to /* as k tends to infinity. 
By "fast enough" we mean that e^^+i — e^. = o(l/k) and e* fc+1 — e* k = o(l/k) for large fc, 
where 

E{F i {k)) = k{f i + e i>k ) and E(i?(fc)) = fc(/* + e* fc ). 



Now, letting 

/5= 1 _ZJP ; (18) 

rp + (1 — p)q + p(l — r)ip — (1 — p)(l — q)6 ' 

we see that f3 k E(Fi(k)) tends to f3fi as k tends to infinity, and letting 

a= (i- p )(i- q )9-p(l-rW (19) 

we see that a k E(F*(k)) tends to af* as k tends to infinity. 

So, letting k tend to infinity, dig) , (|l~6l) and ( |l7|) yield, for i > 1, 



for i = 1, 
and for i > 1, 
whence 

and 



/i = /3(g(i - - + a(l - r)if*, 

fi=pr-0f 1 + a(l-r)ft, 
f* =(3(l-q)if l -a(l-r)if*, 



ft = #77^V * W 

1 + a(l — r)i 
f gpr(r + 1) 

^"(e + iKr + iJ-Ci-g)' C j 



where g = l/f3 and r = l/(a(l — r)). Hence 

, fl / /• iw -,\ , a/3(l - r)(l - q)i 2 
V ' 1 + q(1 — r)? 

and thus 



A= ,.,r,7T/ —2 (22) 



g(z - l)(r + z 
( e + i)(r + i)-(l- q)i 
On using (p2[), repetitively, and the solution to /j is given by 



= gpr T(i)T(i + r + l)r(x + y + l)r(s - y + 1) 
/l T(r + l)T(i + x + y + l)T(i + x-y + l) ' 1 j 
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where 

g + T 

x - 



y 



2q ' 

(( e + T ) 2 - iqref 2 



2q 



and r is the gamma function [AS72, 6.1]. 



Thus for large i, on using the asymptotic expansion of the ratio of two gamma functions 



AS7J, 6.1.47], we obtain 

C 

fi ~ e+( i-, )T , : ■ ( 24 ) 



where ~ means is asymptotic to and 



gpr T(x + y + l)r(x - y + 1) 



Moreover, it can easily be verified from ( |20| ) that 



/; = e (l/i + Vr) /? (26) 



and, from (|24j ) and (^), it follows that 



Ji + Ji ~ g +(l-g)r . I I 1 + 



+i V e (i/* + i/r) 



4 Fitting the Parameters of the Model 

In order to validate the model we use the equations we have derived in Section || to fit the 
parameters of the model. As a first step we validate the model through stochastic simulation, 
and then, in Section ||, we provide a proof of concept on a real wireless network. 

We note that the full set of parameters will, generally, be unknown for real data sets. The 
output from each simulation run is the set of unstarred and starred urns, from which we can 
infer balls & and balls^., the expected number of balls at stage k in the unstarred and starred 
urns, respectively, and pinsk and pins* k , the expected number of pins in the unstarred and 
starred urns, respectively. We are also able to derive approximations for balls & and balls^, 
separately, and similarly for pins, based on their definitions in Section ||. 

From the formulation of the model in Section we have 

balls k + balls! , n _. 
a « pr, (27) 

where the right-hand side of ( |27| ) is the limiting value of the left-hand side as k tends to 
infinity. Similarly, we have, 

pins k +pins* k 

w pr + (1 - p)q. (28) 
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As a result, we can compute the branching factor, bf, as 

pinsk + pins* k 



bf 



ballsk + balls 



k 



which eliminates k, and derive 



The value of the parameter g can be computed from 

pinsk 



(30) 



* fc(l-p)' 

which follows from (||) and the fact that g m Similarly, r can be computed from 

— PmSl (31) 



kp(l — r) 



which follows from (|To| ) and the fact that r ~ (fcafc(l — r)) 1 . Moreover, the value of the 
constant C can be derived from (|25|), given p, g, r, £ and r. 



To fit the parameters we can now numerically minimise the least squares of 

in 

^2 I urn i I - Ckfi, (32) 

i 

where k is the number of steps in the simulation, | urni | denotes the number of balls in urni, 
m denotes the number of urns over which the minimisation takes place and fi is given by 
(|23|), in order to estimate one or more of the parameters given knowledge of the others. (For 
a justification of choosing m to be the first gap in the urn set, i.e. such that from i = 1 to m 
uriii is non-empty and urn m+ \ is empty, see [ FLL05| .) 



We note that we have chosen to do a direct numerical minimisation rather than use a 
regression tool on the log-log transformation of the urn data and try to fit a power-law 
distribution, since fitting power-law distributions is problematic JGMY04 ] . Moreover, the /j's 



in our model obey only asymptotically a power-law distribution and therefore we preferred 
to fit the "correct" distribution with the ratio of gamma functions, as given in (23). 



To validate the simulation we fixed the input parameters p, q, r and k and simulated the 
model in Matlab as described at the beginning of Section We fixed q = 0.9 and the number 
of simulation steps to be k = 10 6 , and varied p and r. 

We first set p = 0.1 and r = 0.5. A typical output of the simulation run produced 
ballsk = 10762, balls* k = 39200, pins k = 77452 and pins* k = 39200. The left-hand side of 



(27) gives an approximation of pr as 0.05, while its right-hand side gives the same value. 



Correspondingly, the left-hand side of (28) gives an approximation of pr + (1 — p)q as 0.8602, 
while its right-hand side gives the value 0.86. Finally, the left-hand side of (|29j) is just p, while 
its right-hand side gives the approximated value p = 0.0999. 

Computing an estimate of g from (]30| ) gives 0.0861, while an estimation of r from ( j3l|) 
gives 15.6541. In order to estimate g and r from the urn data, we first fixed all the parameters 



9 



in (p3j ) apart from C of (|25|), which we estimated, using (|32[), to be C = 651950. We then 
fixed C, given in (|25|), and numerically estimated £ and r in turn obtaining g = 0.0865 and 
r = 15.6541. 

We next set p = 0.2 and r = 0.7. A typical simulation run produced ballsy = 122179, 
balls* k = 18997, pins k = 658273 and pins* k = 201521. The left-hand side of @ gives an 
approximation of as 0.1406, while its right-hand side gives the value pr = 0.14. The left- 
hand side of (pq ) gives as approximation of pr + (1 — p)q as 0.8594, while its right-hand side 
gives the value 0.86. Finally, the left-hand side of ( |29| ) is just p, while its right-hand side gives 
the approximated value p = 0.2009. 

Computing an estimate of g from (|3C|) gives 0.8228, while an estimate of r from ( [31] ) gives 
3.3587. In order to estimate g and r from the urn data, we first fixed all the parameters in 
( |23| ) apart from C of (p5|), which we estimated, using (|32|), to be C = 15742. We then fixed 
C in ( ^3|) and numerically estimated g and r in turn obtaining £> = 0.7983 and r = 3.35. 
Additional runs of the simulation produced similar results in terms of their accuracy. We 
note that we limited m in (^) so that its maximum value be 90, due to numerical overflow 
of the product of gamma functions for larger values of m. 

The simulations demonstrate that, given that the data is consistent with the urn transfer 
model we have defined in Section ^, numerical optimisation can be used to accurately estimate 
the parameters of the model. 



5 Real Social Networks 

As a proof of concept we made use of a public log containing traces of the activity of users 
within a campus-wide WLAN network recorded by the Crawdad project ( [http : / / crawdadL 



cs . dartmouth . edu| ) at the Center for Mobile Computing at Dartmouth College [KH05]. The 



data set we elected to work with was collected during 2001-2003 using the syslog system event 
logging facility available on the wireless access points. Each access point was configured so as 
to transmit a message logged at one of two dedicated servers maintained by the project, every 
time a client card authenticated, associated, reassociated, disassociated or deauthenticated 
with the access point. In total, approximately 13.5 million events have been recorded during 
this period. 

In the syslog records, client cards are identified by their MAC address. It should be 
noted that there is no one-to-one relationship between card addresses, devices and users, as 
in some cases one card may have been used with more than one device and one device may 
have been using more than one card. Moreover, a user may be using more than one device. 
Mobility traces were computed from the raw syslog messages for each device. A special access 
point name signifies that a card is not connected to the wireless network. This condition was 
determined by the syslog message "Disauthentication" from the last associated access point 
with reason field "Inactivity". Such messages are commonly generated when the card is 
inactive for 30 minutes. For simplicity, from now on, we will refer to a client card as a user. 

In Figure [l] we show the log-log plot of the number of accesses of the active and inactive 
users at the end of the trace period. From the figure we may conjecture an asymptotic 
power-law distribution, but as can be seen the tails are very fuzzy and therefore regression or 



maximum likelihood methods are unlikely to succeed [GMY04]. For this reason, as mentioned 



in Section |4j, we preferred to estimate the parameters of the model numerically via least squares 
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minimisation. 

Our model is fully specified by the four input parameters p, q, r and k, as described in 
Section |2[ Of particular interest are the following probabilities: 

(1) pr, which is the rate at which new users join the network and attain their first wireless 
connection. 

(2) p(l — r), which is the rate at which inactive users become active again. 

(3) (1 —p)q, which is the rate at which active users attain a new wireless connection without 
first disconnecting from the network. 

(4) (1 — p)(l — q), which is that rate at which active users become inactive. 

(5) k, which can be viewed as the life of the network, assuming that the evolution takes 
place in discrete time steps, where at each time a single change occurs in the network 
according to the urn transfer model described in Section ||. 

We processed the Dartmouth data set so that it contains pairs of users and their activity, 
where each user is identified by a client card and an activity corresponds to (1), (2), (3) or 
(4) above. We then estimated the probabilities p, r and q from the data, taking k to be the 
number of pairs processed. Prom this we obtained, p = 0.0994, r = 0.0046, q = 0.8897 and 
k = 13559701. 

Next we estimated g from ( |30[ ) and r from (|3l"|), obtaining g = 0.1244 and r = 6.9704. 
Using ( |24|) and ( p5|) we estimated the exponent of the asymptotic power-law distribution as 

g+(1 ~ g)T + 1 = 2.0040. 

q 



As a validation of the model we populated the unstarred and starred urns according to 
the activity pairs from the processes data set. Then, using the methodology described in 
Section ^ we numerically minimised the least squares of the sum over i of the differences 
between the number of balls in uriii, respectively urn*, and the predicted number of balls 
according to (p3[), and respectively (26), in accordance to (p2|). The fitted parameters we 



obtained from the unstarred urns using fl23|) , were: q = 0.8901, g = 0.1101 and r = 6.9648, 
obtaining (g+ (1 — q)r)/q + 1 = 1.9836. The corresponding set of fitted parameters obtained 
from the starred urns using Q26|), were: q = 0.8898, g = 0.1385 and r = 6.9473, obtaining 
(g + (1 — q)r)/q + 1 = 2.0161. As can be seen the fitted parameters are consistent with the 
ones we have mined from the original data set. 

As a further validation of the model we ran a simulation implemented in Matlab according 
to the description of the stochastic process in Section |2[ with the parameters k = 13559701, 
p = 0.0994, r = 0.0046 and q = 0.8897 as mined from the data set. We note that 

p = 0.0994 > (1 -p)(l - q) = 0.0993 

and 

(1 -p)(l-q)= 0.0993 > p{\ -r)= 0.0989 



11 



as required in the specification of the stochastic process in Section pL So, for the probability 
of termination, with either all starred or unstarred urns being empty, to be less than 0.1 we 
should set the initial number of balls in um\ to be 5 = 3600, and the initial number of balls 
in urn\ to be S* = 600. We verified this by running a simplified version of the simulation, 
which only accounts for the total number of balls in starred and unstarred urns. Out of 
ten simplified simulation runs with the above input parameters none terminated with all the 
unstarred or starred urns being empty. 

We decided in our simulation to ignore the problem of empty urns, the justification being 
that having empty urns at some stage of the stochastic process does not have much effect on 
the exponent of the asymptotic power-law distribution, since by (30) and (^|) the exponent 
given in (|24| ) is approximately proportional to pins^ + pins^, and by (|28|) the total number 
of pins depends only on the input parameters through independent random variables. 

From pinsk and pins* k output from the simulation we computed g = 0.1054 from (p0|) , 
r = 7.1442 from (|3l]), and finally the exponent of the asymptotic power-law distribution was 
computed as (g + (1 — q)r)/q + 1 = 2.0042. As can be seen, the output from the simulation 
is consistent with the parameters mined from the data; a second simulation with the same 
input parameters produced similar results. 

Overall, on the evidence from the computational results , the urn transfer model described 
in Section ||, is a viable model for a real social network, specifically for the access patterns of 
users within the Dartmouth wireless network. 



6 Concluding Remarks 

We have presented an extension of Simon's classical stochastic process where each actor can 
be either in an active or an inactive state. Actors, chosen by preferential attachment may 
attain a new relation, become inactive or later become active again. The system is closed 
in the sense that once an actor enters the system he remains within the system. We have 
shown in (^) and (^) that, asymptotically, the number of active and inactive actors having 
prescribed number of relations is a power-law distribution. As a proof of concept we validated 
the model on a real data set of wireless accesses over a lengthy period of time. The validation 
made use of numerical optimisation rather than using standard regression tools, due to the 
known difficulty of detecting asymptotic power-law distributions in data. 

The stochastic model we have presented is relevant to social networks where users may be 
active or inactive at different times. Two such real-world networks are wireless networks and 
peer-to-peer networks, although it remains to validate our model on a real peer-to-peer data 
set. In fact, our model could also be used to model user activity in an e-commerce portal or 
an online forum, where registration is required. 
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